-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Clustering and App related functions #4407
base: master
Are you sure you want to change the base?
Implement Clustering and App related functions #4407
Conversation
naiming-zededa
commented
Oct 30, 2024
- change VMI to VMI ReplicaSet for kubernetes
- change Pod to Pod RelicaSet for containers
- change functions handling replicaset names in services
- subscribe EdgeNodeInfo in domainmgr, zedmanager to get node-name for cluster
- add Designated Node ID to several structs for App
- not to delete domain from kubernetes if not a Designated App node
- parse config for EdgeNodeClusterConfig in zedagent
- handle ENClusterAppStatus publication in zedmanger in multi-node clustering case
- zedmanager handling effective-activation include ENClusterAppStatus
- kubevirt hypervisor changes to handle VMI/Pod ReplicaSets
631cc70
to
1aed95e
Compare
} | ||
if err := hyper.Task(status).Cleanup(status.DomainName); err != nil { | ||
log.Errorf("failed to cleanup domain: %s (%v)", status.DomainName, err) | ||
// in cluster mode, we can not delete the pod due to failing to get app info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the issue this is fixing something appearing when there is a failover/takeover and another node in the cluster starts running the app instance?
Or is it something which could happen when an app instance is first provisioned on the cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can happen even on the first node of the app deployment, sometimes we can not get the status from the k3s cluster, or somethings it takes time to come up running state, but we should not remove this kubernetes configuration, it has the config stored in the database, it has it's own scheduling and control process to eventually bring it to the intended state. If we delete the config from the cluster, then we need to wait for another 10 minutes to retry, etc. and it will cause confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, a new boolean is introduced in the domainstatus, DomainConfigDeleted, allow the Designated node, if it knows for sure the app instance is removed from the device, then it can go ahead to delete the app/domain from the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to capture the above explanation either in a comment here or in pkg/pillar/docs/zedkube.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I documented this in zedkube.md, and referenced from domainmgr.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm - re-reading it and it still looks odd.
Does the kubevirt Info() return an error or does it return HALTED when the issue is merely that it can't (yet) fetch the status from k3s?
Can we not fix that Info() to return something more accurate?
If it returns an error or HALTED then this function will set an error, and that error might be propagated to the controller, which would be confusing if the task is slow at starting, or is starting on some other node.
So I think this check is in the wrong place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current code in kubevirt.go logic is that
- if not found this app, then it will return status "", and an error logError("getVMIStatus: No VMI %s found with the given nodeName %s", repVmiName, nodeName)
- if found this app on another node in cluster, then it returns status "nolocal", no error
- if found on this node, then return whatever the kubernetes app running status
with above condition, if error is returned, then status is set to types.Broken
we further map the above status in string with a mapping:
// Use few states for now
var stateMap = map[string]types.SwState{
"Paused": types.PAUSED,
"Running": types.RUNNING,
"NonLocal": types.RUNNING,
"shutdown": types.HALTING,
"suspended": types.PAUSED,
"Pending": types.PENDING,
"Scheduling": types.SCHEDULING,
"Failed": types.FAILED,
}
this is a good point of the error condition if not running will be confusing. I can change the condition 1) above to 'Scheduling', and it's a currently defined state, and sort of reflecting the kubernetes app status.
1aed95e
to
2e3e102
Compare
2e3e102
to
8a13e80
Compare
8a13e80
to
b7140d2
Compare
b7140d2
to
2453125
Compare
I have rebased and resolved the conflicts. Please review and see if the PR is ok. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #4407 +/- ##
==========================================
- Coverage 20.93% 20.90% -0.03%
==========================================
Files 13 13
Lines 2895 2894 -1
==========================================
- Hits 606 605 -1
Misses 2163 2163
Partials 126 126 ☔ View full report in Codecov by Sentry. |
@@ -3603,3 +3703,16 @@ func lookupCapabilities(ctx *domainContext) (*types.Capabilities, error) { | |||
} | |||
return &capabilities, nil | |||
} | |||
|
|||
func getnodeNameAndUUID(ctx *domainContext) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func getnodeNameAndUUID(ctx *domainContext) error { | |
func (ctx *domainContext) retrieveNodeNameAndUUID(ctx *domainContext) error { |
as this is not a getter method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
pkg/pillar/hypervisor/kubevirt.go
Outdated
|
||
// Add pod non-image volume disks | ||
if len(diskStatusList) > 1 { | ||
leng := len(diskStatusList) - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
g
stands for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to 'length'.
I don't see any tests. |
2453125
to
acd9962
Compare
Added hypervisor/kubevirt_test.go now. |
acd9962
to
1129c28
Compare
- change VMI to VMI ReplicaSet for kubernetes - change Pod to Pod RelicaSet for containers - change functions handling replicaset names in services - subscribe EdgeNodeInfo in domainmgr, zedmanager to get node-name for cluster - add Designated Node ID to several structs for App - not to delete domain from kubernetes if not a Designated App node - parse config for EdgeNodeClusterConfig in zedagent - handle ENClusterAppStatus publication in zedmanger in multi-node clustering case - zedmanager handling effective-activation include ENClusterAppStatus - kubevirt hypervisor changes to handle VMI/Pod ReplicaSets Signed-off-by: Naiming Shen <[email protected]>
1129c28
to
7db2529
Compare
@@ -3603,3 +3709,16 @@ func lookupCapabilities(ctx *domainContext) (*types.Capabilities, error) { | |||
} | |||
return &capabilities, nil | |||
} | |||
|
|||
func (ctx *domainContext) retrieveNodeNameAndUUID() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is our own nodes nodename, right?
Can't it retrieve that at domainmgr startup?
And why is "UUID" part of the name of the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part I used to at the startup of domainmgr, the 'waitEdgeNodeInfo' only wait for max of 5 minutes, so i could not relied on this. In previous review of this PR, you pointed out this, and I removed it. So, good point, I should also remove this function here.
I used to have UUID in the function() in previous code before this PR, will remove that.
if err := hyper.Task(status).Cleanup(status.DomainName); err != nil { | ||
log.Errorf("failed to cleanup domain: %s (%v)", status.DomainName, err) | ||
// in cluster mode, we can not delete the pod due to failing to get app info | ||
if !ctx.hvTypeKube { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I get the feeling that most or all of these hvTypeKube checks should not be in domainmgr but he inside the kubevirt hypervisor package.
If for instance zedmanager is telling domainmgr to shut down or delete a task and if for kubevirt this isn't needed except on some designated node, can't that check whether designated or not be done inside the hypervisor/kubevirt code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move this into the hypervisor functions, and I think mainly we need to pass in if this Domain config is being deleted or not (DomainStatus.DomainConfigDeleted), so, we need to change the API from '.Delete(domainName string)' into '.Delete(status *types.DomainStatus)' (that is going to change for other hypervisors)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you move it there? If so these this change and the preceeding comment can be removed from verifyStatus(), right?
if zcfgCluster == nil { | ||
log.Functionf("parseEdgeNodeClusterConfig: No EdgeNodeClusterConfig, Unpublishing") | ||
pub := ctx.pubEdgeNodeClusterConfig | ||
items := pub.GetAll() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there only a global key for this? If so there is no need to call GetAll - just call Unpublish("global")
Otherwise call GetAll and walk the set of returned objects and unpublish each of their keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
handleENClusterAppStatusImpl(ctx, key, nil) | ||
} | ||
|
||
func handleENClusterAppStatusImpl(ctx *zedmanagerContext, key string, status *types.ENClusterAppStatus) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should be updated from our POC branch. Or I can do it as part of my PR for failover handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I suggest amending this commit with latest handleclusterapp.go and applogs.go from POC branch.
In that way they can be reviewed as a unit and my follow up PRs need not include those files.
@@ -154,7 +154,7 @@ func (z *zedkube) checkAppsStatus() { | |||
} | |||
|
|||
pub := z.pubENClusterAppStatus | |||
stItmes := pub.GetAll() | |||
stItems := pub.GetAll() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update this function from the POC branch, it changed a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zedi-pramodh as discussed, since some of the types boolean changes, update the poc code to applog.go would need to update many other files. I'll leave to your later PR to add those changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok no problem, I will take care of those and submit my PR once this one is merged.
- changed the function to retrieveDeviceNodeName() and call it only at the start of domainmgr Run() - remove the ctx.hvTypeKube and status.IsDNidNode checks in the domainmgr.go code; also remove the status.DomainConfigDeleted. we now rely on normal domain handling of delete/cleanup work flow - fixed a bug where nodeName with underscore, which is not allowed in kubernetes names - changed the zedmanager/handleclusterstatus.go code to PoC code base, and commented out one line for later PR to handle - implemented the scheme when kubevirt can not contact kubernetes API-server or the cluster does not have the POD/VMI being scheduled yet, we return the 'Unknown' status now. It keeps a starting 'unknown' timestamp per application - also if the 'unknown' status lasts longer than 5 minutes, it changes into 'Halt' status back to domainmgr - updated 'zedkube.md' section 'Handle Domain Apps Status in domainmgr' for the above behavior Signed-off-by: Naiming Shen <[email protected]>
3e80fde
to
ee49db4
Compare
} | ||
enInfo := NodeInfo.(types.EdgeNodeInfo) | ||
ctx.nodeName = strings.ReplaceAll(strings.ToLower(enInfo.DeviceName), "_", "-") | ||
log.Noticef("retrieveDeviceNodeName: devicename, NodeInfo %v", NodeInfo) // XXX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the XXX comment?
// Get the EdgeNode info, needed for kubevirt clustering | ||
err = domainCtx.retrieveDeviceNodeName() | ||
if err != nil { | ||
log.Fatal(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't fatal here since you only wait if hvTypeKube.
How about making Eden/Adam send the EdgeNodeInfo and wait even if not hvTypeKube?
That would remove some special cases like this issue.
@@ -1103,6 +1106,16 @@ func initPublications(zedagentCtx *zedagentContext) { | |||
} | |||
getconfigCtx.pubZedAgentStatus.ClearRestarted() | |||
|
|||
zedagentCtx.pubEdgeNodeClusterConfig, err = ps.NewPublication(pubsub.PublicationOptions{ | |||
AgentName: agentName, | |||
Persistent: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be persistent? That makes it much more painful to add fields to the EdgeNodeClusterConfig in future EVE releases.
|
||
isDNiDnode := false | ||
if aiConfig.DesignatedNodeID != uuid.Nil && aiConfig.DesignatedNodeID == ctx.nodeUUID { | ||
isDNiDnode = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be set to true on a single node cluster? For a non-kubevirt case?
It would make sense to add some comment about the expectations in those cases.
st, _ := sub.Get(key) | ||
if st != nil { | ||
clusterStatus := st.(types.ENClusterAppStatus) | ||
if !clusterStatus.ScheduledOnThisNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No publishing seems odd.
I can understand that we want the controller to ignore Info messages (and perhaps also Metrics messages) for app instances which are not scheduled on this node, but it would make more sense to either put that check in the controller (which might require a some new fields in the Info and Metrics protobuf messages) or put the code in zedagent to exclude sending Info and Metrics for such an app instance. But publish here for other agents to see a Status for each Config.
@@ -1563,6 +1617,14 @@ func updateBasedOnProfile(ctx *zedmanagerContext, oldProfile string) { | |||
} | |||
|
|||
// returns effective Activate status based on Activate from app instance config and current profile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment needs to be updated.
} | ||
|
||
if ctx.nodeUUID == uuid.Nil { | ||
err := getnodeNameAndUUID(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fetch this when zedmanager starts and not here.
@@ -282,8 +282,8 @@ func waitForNodeReady(client *kubernetes.Clientset, readyCh chan bool, devUUID s | |||
if err != nil { | |||
return err | |||
} | |||
if len(pods.Items) < 6 { | |||
return fmt.Errorf("kubevirt running pods less than 6") | |||
if len(pods.Items) < 4 { // need at least 4 pods to be running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the magic number 4 come from? Please expand on comment.
@@ -1378,3 +1707,33 @@ func (ctx kubevirtContext) VirtualTPMTerminate(domainName string, wp *types.Watc | |||
func (ctx kubevirtContext) VirtualTPMTeardown(domainName string, wp *types.WatchdogParam) error { | |||
return fmt.Errorf("not implemented") | |||
} | |||
|
|||
func getMyNodeUUID(ctx *kubevirtContext, nodeName string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reads more like a "set" or "save" function than a "get" function.
|
||
### Handle Domain Apps Status in domainmgr | ||
|
||
When the application is launched and managed in KubeVirt mode, the Kubernetes cluster is provisioned for this application, being a VMI (Virtual Machine Instance) replicaSet object or a Pod replicaSet object. It uses a declarative approach to manage the desired state of the applications. The configurations are saved in the Kubernetes database for the Kubernetes controller to use to ensure the objects eventually achieve the correct state if possible. Any particular VMI/Pod state of a domain may not be in working condition at the time when EVE domainmgr checks. In the domainmgr code running in KubeVirt mode, if it can not contact the Kubernetes API server to query about the application, or if the application itself has not be started yet in the cluster, the kubervirt.go will return the 'Unknown' status back. It will keep a 'Unknown' status starting timestamp per application. If the 'Unknown' status lasts longer then 5 minutes, the status functions in kubevirt.go will return 'Halt' status back to domainmgr. The timestamp will be cleared once it can get the application status from the kubernetes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the code it looks like it will return "Halting" and not "Halt" status.
@@ -1138,6 +1175,11 @@ func maybeRetryBoot(ctx *domainContext, status *types.DomainStatus) { | |||
log.Errorf("Failed to setup vTPM for %s: %s", status.DomainName, err) | |||
} | |||
|
|||
// pass nodeName to hypervisor call Setup | |||
if status.NodeName == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't NodeName be set when DomainStatus is created in handleCreate?
@@ -1684,6 +1726,11 @@ func doActivate(ctx *domainContext, config types.DomainConfig, | |||
log.Errorf("Failed to setup vTPM for %s: %s", status.DomainName, err) | |||
} | |||
|
|||
// pass nodeName to hypervisor call Setup | |||
if status.NodeName == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto